Mini Project 3: Visualizing and Maintaining the Green Canopy of NYC
📚Introduction
Many New Yorkers do not appreciate the trees that benefit them and their environment on a daily basis. Over 1 million trees (specifically 1,093,439 trees) are spread across the Big Apple yet only litter is scattered through most of them. Such people do not consider that these trees are essential for reducing CO2 exposure, provide shelter for birds and squirrels, and provide shade while giving the tree sunlight to grow.
While this project is not meant to start a “stop litter” movement, it analyzes trees and their corresponding district to make a proposal for the NYC Parks Department. Specifically, the goal is to create a new program on why action must be taken in a specific district addressing its trees using visualizations gathered from official NYC data websites.
Setting up code libraries
#Below are the following libraries used for this project.#Obtaining data and performing SQL like commandslibrary(sf)library(tidyverse)library(httr2)#Data injectionlibrary(glue)library(readxl)library(tidycensus)#Display datatableslibrary(DT)#Visualization librarylibrary(ggplot2)library(plotly)library(tidyr)
💽Download NYC City Council District Boundaries
Data was collected from the NYC Department of Planning using the latest release as of making this project, 25C. The shoreline version will be collected as it can display more trees compared to the the water area version.
Downloading the Boundary Data
#The following code was inspired from how we inject data from mp02#Create directory, if it does not exist already, to store dataif(!dir.exists(file.path("data", "mp03"))){dir.create(file.path("data", "mp03"), showWarnings=FALSE, recursive=TRUE)}library <-function(pkg){## Mask base::library() to automatically install packages if needed## Masking is important here so downlit picks up packages and links## to documentation pkg <-as.character(substitute(pkg))options(repos =c(CRAN ="https://cloud.r-project.org"))if(!require(pkg, character.only=TRUE, quietly=TRUE)) install.packages(pkg)stopifnot(require(pkg, character.only=TRUE, quietly=TRUE))}#Define zip file name to indicate whether it will existzip_name <-"nycc_25c.zip"url_path <-"https://s-media.nyc.gov/agencies/dcp/assets/files/zip/data-tools/bytes/city-council/nycc_25c.zip"#Zip file pathzip_path <-"./data/mp03/"#Downloads the required file into the correct directoryif(!file.exists(glue(zip_path, zip_name))){download.file(url = url_path, destfile =paste0(zip_path, "/", zip_name), mode ="wb")}unzipped_pathname <-paste0(zip_path, "nycc_25c/")#Unzip file if necessaryif(!dir.exists(unzipped_pathname)){unzip(paste0(zip_path, "/", zip_name), exdir = zip_path, overwrite =TRUE) #Paste0 to specify pathname of the file}#Read shp file and store it as the data variableDATA <- sf::st_read(paste0(unzipped_pathname, "nycc.shp"))#Transform result into WGS 84DATA <-st_transform(DATA, crs="WGS84")
Raw District Boundary Data Output
#Returning transformed DATA to userdatatable(DATA, style ="bootstrap5", caption ="Raw Data Output")
Explaining the Table
Note: column names were left untouched to show raw data. It may be difficult to understand at first glance.
The datatable may look scary but provides important information later on. Most notably are columns Shape_Leng showing total length of a district in NYC and Shape_Area showing how large the district is. Currently, there are 51 districts to work with.
Data Made Easier
The visualization below makes it much easier to see where trees are being looked at. More specifically, it shows the 5 boroughs of the NYC metropolitan area with a boundary acting as a district.
Show the code
#Visualization of area being worked onggplot() +geom_sf(data = DATA, mapping =aes(geometry = geometry)) +theme_bw()
Show the code
rm(all)
💽Download NYC Tree Points
Since this project focuses on trees, data containing tree location is used as a main metric. The code below downloads the necessary data.
Downloading the Tree Data
#The following code is a modified version of data acquisition from https://michael-weylandt.com/STA9750/archive/AY-2024-SPRING/miniprojects/mini01.htmlif(!file.exists("data/mp03/nyc_tree_locations.csv")){#URL was modified as per instructions ENDPOINT <-"https://data.cityofnewyork.us/resource/hn5i-inap.geojson" BATCH_SIZE <-50000#Edit if we start to see long computations for visuals. Same with offset. OFFSET <-0 END_OF_EXPORT <-FALSE ALL_DATA <-list()while(!END_OF_EXPORT){cat("Requesting items", OFFSET, "to", BATCH_SIZE + OFFSET, "\n") req <-request(ENDPOINT) |>req_url_query(`$limit`= BATCH_SIZE, `$offset`= OFFSET) resp <-req_perform(req) batch_data <-st_read(resp_body_string(resp))# batch_data <- fromJSON(resp_body_string(resp)) ALL_DATA <-c(ALL_DATA, list(batch_data))if(NROW(batch_data) != BATCH_SIZE){ END_OF_EXPORT <-TRUEcat("End of Data Export Reached\n") } else { OFFSET <- OFFSET + BATCH_SIZE } } ALL_DATA <-bind_rows(ALL_DATA)cat("Data export complete:", NROW(ALL_DATA), "rows and", NCOL(ALL_DATA), "columns.")write_csv(ALL_DATA, "data/mp03/nyc_tree_locations.csv")}
🗺Mapping️️ NYC Trees
Now that the necessary data has been collected, a visualization will be made to display:
Density of trees in a district
Exact locations of trees
Health of each tree
The visualization will serve as a starting point at which area(s) should be addressed with the best possible reasons.
Creating graph
#Read in data from the files that were downloaded.boundaries <-st_read('./data/mp03/nycc_25c')tree_data <-read.csv('./data/mp03/nyc_tree_locations.csv', stringsAsFactors =FALSE) |>filter(!is.na(tpcondition), !is.na(geometry)) |>#Rename column to be easier to understand on interactive visualizationrename("Condition"= tpcondition)# Parse the "c(lon, lat)" stringtree_data_parsed <- tree_data |>mutate(coord_str =trimws(gsub("c\\(|\\)", "", geometry))) |># Remove "c(" and ")"separate_wider_delim(coord_str, delim =",", names =c("x", "y"), too_few ="align_start") |>mutate(x =as.numeric(x),y =as.numeric(y) )# Create sfc geometrytree_data$geometry <-st_as_sfc(paste0("POINT(", tree_data_parsed$x, " ", tree_data_parsed$y, ")"))# Convert to sftree_data <-st_as_sf(tree_data)st_crs(tree_data) <-4326#Joining the boundary and tree dataall_data <-st_transform(tree_data, st_crs(boundaries))all_data <-st_join(all_data, boundaries)all_data_small <- all_data |>slice_head(n=30000)#Used for later questions#Count trees per districttree_counts <- all_data |>group_by(CounDist) |>summarise(tree_count =n(), .groups ='drop')#Add findings to boundaries datasetboundaries <- boundaries |>st_join(tree_counts)#Store plot in variable to make it interactive in the next code blocktree_plot <-ggplot() +geom_sf(data = boundaries, mapping =aes(geometry = geometry, fill = tree_count)) +scale_fill_gradient(low ="#F0FFF0", high ="#084511", name ="Tree Count") +geom_sf(data = all_data_small, mapping =aes(geometry = geometry, color = Condition), alpha =0.5, size =0.3) +guides(color ="none") +scale_color_discrete() +labs(color ="Condition",title ="Street Trees in NYC by City Council District",subtitle ="Points represent the trees, shade shows tree density") +guides(color =guide_legend(override.aes =list(size =3))) +theme_bw()tree_plot
Show the code
#Make plot interactive using plotlyggplotly(tree_plot)
Notes on the Visualization
Note: The graph contains the first 30000 as points trees due to hardware limitations. The statements below only reflect this visualization and could change afterwards.
Within the 5 boroughs, Staten Island has the greatest density of trees yet most of these trees have an unknown or dead status. The Bronx has a large quantity of trees rated in excellent condition likely due to being far away from the JFK airport and being a starting point outside the metropolitan area. Manhattan also has many trees above the first bottom district, either representing an act was made to plant more trees or is simply used as decoration to attract tourists. This is an interactive graph, explore other areas to find different results!
🌲District-Level Analyses of Trees
With the tree points and district boundaries now connected to one data table, more analysis can be done besides looking at the visualization. For instance, it is must easier to determine which district had the most amount of trees instantly, not having to second guess our answer when doing this visually.
Note that all trees will be included in the following analyses.
Show the code
#Remove datasets that repeat tree data. Also remove redundant valuesrm(tree_data, tree_data_parsed, unzipped_pathname, url_path, DATA, zip_name, zip_path, ALL_DATA)
Finding District with Most Trees
District with most trees
#Find the district with the most treestree_counts <- all_data |>group_by(CounDist) |>summarise(tree_count =n(), .groups ='drop') |>mutate(Borough =case_when( CounDist >=1& CounDist <=10~"Manhattan", CounDist >=11& CounDist <=18~"Bronx", CounDist >=19& CounDist <=32~"Queens", CounDist >=33& CounDist <=48~"Brooklyn", CounDist >=49& CounDist <=51~"Staten Island",TRUE~NA_character_ )) |>arrange(desc(tree_count))#Create a format_titles variable to make the table columns look nicer. Used in later chunks#Credit: Professor Michael Weylandtlibrary(stringr)format_titles <-function(df){colnames(df) <-str_replace_all(colnames(df), "_", " ") |>str_to_title() df}tree_counts |>st_drop_geometry() |>slice_head(n=10) |>select(CounDist, Borough, tree_count) |>format_titles() |>rename("Council District"= Coundist) |>datatable(style ="bootstrap5", caption ="Top 10 Districts With The Most Trees")
Findings
Council District 51 in Staten Island has the most trees with 70965 recorded. Oddly enough, Staten Island also ranks 2nd and 6th for having the most trees, possibly indicating it is tree dense with so many trees in one borough (Staten Island only has 3 districts).
Many Council Districts for Queens also appear, alluding that there is a good chance trees will be seen whichever neighborhood one enters.
District with Highest Tree Density
Show the code
#Use the Shape_Area column to act as the density maker per districtdensity_trees <- all_data |>st_drop_geometry() |>group_by(CounDist) |>summarise(Shape_Area =first(Shape_Area), # or sum()/mean() if appropriate.groups ="drop" ) |>left_join( tree_counts |>st_drop_geometry() |>select(CounDist, tree_count, Borough) |>distinct(CounDist, .keep_all =TRUE), # Remove duplicate CounDist rowsby ="CounDist" ) |>mutate(area_sqkm =as.numeric(Shape_Area) /1e6,tree_density = tree_count / area_sqkm ) |>arrange(desc(tree_density)) |>drop_na() |>select(CounDist, Borough, tree_count, area_sqkm, tree_density)density_trees |>format_titles() |>rename("Council District"= Coundist) |>rename("Area (sqkm)"="Area Sqkm") |>rename("Tree Density (sqkm)"="Tree Density") |>datatable(style ="bootstrap5", caption ="Top 10 Districts With Most Dense Trees") |>formatRound(c("Area (sqkm)", "Tree Density (sqkm)"), digits =3)
Findings
Council District 7 in Manhattan has the most dense trees with 283.549 per sqkm recorded. Despite having a near top tree count of 15,000, Council District 7 is the 4th smallest District in all of the NYC metropolitan area and managed to cram the most trees in one place doing so. Compared to the largest district 50 in Staten Island, it has a tree density of about 78 sqkm, likely due to the size of the district.
Manhattan is a borough that excels in density as it crams in whatever it can into the most popular borough worldwide, appearing 5 times in the top 10 list. Having this mindset could also be a reason districts in Manhattan did so well in this category.
District with Most Amount of Dead Trees
Show the code
#Calculating statistics for dead treesdead_trees <- all_data |>st_drop_geometry() |>filter(!is.na(Condition), !is.na(CounDist)) |>group_by(CounDist) |>summarize(total_trees =n(),total_dead_trees =sum(Condition =='Dead', na.rm =TRUE),fraction_dead_trees = total_dead_trees / total_trees *100,.groups ='keep') |>left_join( tree_counts |>st_drop_geometry() |>select(CounDist, Borough) |>distinct(CounDist, .keep_all =TRUE),by ="CounDist" ) |>select(CounDist, Borough, total_trees, total_dead_trees, fraction_dead_trees) |>arrange(desc(fraction_dead_trees))dead_trees |>rename("Council District"= CounDist) |>format_titles() |>rename("Fraction Dead Trees %"="Fraction Dead Trees") |>datatable(style ="bootstrap5", caption ="Dead Tree Data") |>formatRound("Fraction Dead Trees %", digits =3)
Findings
Council District 32 in Queens has the highest percent of dead trees compared to the rest of its trees with about 14.255% of trees being dead. A reason for this could be that Queens generally does not receive attention like Manhattan would; paired with being a very large borough leads to more required maintenance. District 32 does land in the top 10 of most amount of trees in the district, explaining there is a ton of work to fix those trees.
What’s interesting is that Brooklyn had no districts in this category, showcasing it either has fewer trees than Queens or is capable to maintain them more effectively.
The most common tree species in Manhattan is the Thornless honeylocust with 17310 appearances. This appears to be a very common tree across Manhattan as the next most common, the London planetree has about 6000 fewer appearances. Trees quickly go to 4 digits, then 3 digits for total appearance, suggesting the Thornless honeylocust may live longer, can adapt to the industrial standards of Manhattan, and actually thrive compared to other species. More statistics would be needed to verify such a claim.
Tree Species Closest to Baruch College
Show the code
#Create point of Baruch College using longitude and latitudebaruch_point <-st_point(c(-73.98376, 40.74019)) |>#Point is approximatedst_sfc(crs =4326) |>st_transform(st_crs(all_data))#EPSG:2263 is NAD83 / New York Long Island (ftUS); metric used in mappingbaruch_point <-st_transform(baruch_point, 2263)# Create 1 km buffer around Baruch College (also reduces RAM usage)buffer_1km <-st_buffer(baruch_point, dist =1000)# Filter all_data to only those within the bufferall_data_near <-st_filter(all_data, buffer_1km, .predicate = st_intersects)#Finding the tree species closest to Baruch Collegebaruch_species <- all_data_near |>filter(!is.na(genusspecies), !is.na(CounDist)) |>left_join( tree_counts |>st_drop_geometry() |>select(CounDist, Borough) |>distinct(CounDist, .keep_all =TRUE),by ="CounDist" ) |>mutate(distance_to_baruch =as.numeric(st_distance(geometry, baruch_point))) |>arrange(distance_to_baruch) |>head(50) |>select(genusspecies, distance_to_baruch, riskrating)baruch_species |>st_drop_geometry() |>rename("Species"= genusspecies) |>format_titles() |>datatable(style ="bootstrap5", caption ="Tree Species Closest to Baruch (1km radius)") |>formatRound("Distance To Baruch", digits =4)
Findings
The closest tree species to Baruch College is the Quercus acutissima - sawtooth oak with a distance of 54.3122 meters. This tree has a healthy rating given it has no Risk Rating.
There also appeared to be a trend of Risk Rating and departing away from Baruch College up to 1 km away. Being further away showed more trees with a risk rating present. This could indicate institutions such as Baruch College tend to provide better maintenance for their trees compared to having no college around.
🪵NYC Parks Proposal
Project Description
Walking in an area full of dead trees may feel like a barren wasteland. This statement is what it feels like to enter the Rockaways in District 32. These dead trees must be repurposed in ways that can benefit both humans and wildlife while being replaced with new trees to grow and prosper in their place. Therefore, I propose to replace at least 4,000 trees, remove all stumps, and plant at least 3,500 new trees to effectively renovate District 32 using as little budget needed.
District Map
The following visualization provides information on where trees are located in our district.
Show the code
#Filter for only District 32dist32 <- all_data |>filter(CounDist ==32) |>st_join(boundaries, by ="CounDist")boundary_32 <- boundaries |>filter(CounDist.x ==32)#Create the visualizationggplot() +geom_sf(data = boundary_32, mapping =aes(geometry = geometry)) +geom_sf(data = dist32, mapping =aes(geometry = geometry, color = Condition, alpha = Condition), size =1) +scale_alpha_manual(values =setNames(c(1, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3), levels(dist32$Condition)), guide="none") +labs(color ="Condition",title ="City Council District 32") +guides(color =guide_legend(override.aes =list(size =3))) +theme_bw() +theme(aspect.ratio =0.90)
At first glance, most trees are in excellent or good condition, so why worry? Clearly, the Rockaways at the bottom has healthy trees but as we move upwards, tree condition begins to degrade. The Ozone Park at the top has many trees in the good category while the neighborhood at the right has trees that are critical or dead. Even the trees in between the top and bottom areas lost their excellent condition.
Lets also think about the geographic area being spoken about here. The Rockaways are exposed to more severe weather compared to the rest of NYC given it is surrounded in water in a dense manner. Having debris of dead trees poses a threat for households to lose power or break the structure. It even poses a threat to humans all the time from splinters to having debris thrown into one’s face from the winds. If this is not convincing enough, the numbers might change your mind.
Quantitative Comparison
District 32 had the highest percent of dead trees in NYC, calculated to be about 14.2% with 4315 dead trees sitting there doing nothing. These statistics will be compared with districts 30, 42, and 46 as they are all adjacent to district 32. District 30 is an outliter for not being next to water but can represent how trees perform in different areas.
The chart and bar graph below indicate additional statistics to look at.
Show the code
#Compares percent of dead trees across selected districtsdist_comparison <- dead_trees |>st_drop_geometry() |>filter(CounDist %in%c(32, 30, 42, 46)) |>left_join( density_trees |>st_drop_geometry() |>select(CounDist, area_sqkm, tree_density) |>distinct(CounDist, .keep_all =TRUE),by ="CounDist")ggplot(dist_comparison, aes(x=reorder(factor(CounDist), -fraction_dead_trees), y=fraction_dead_trees, fill = CounDist ==32)) +geom_bar(stat ="identity") +geom_text(aes(label =paste0(round(fraction_dead_trees, 2), "%")),hjust =-0.1, size =3) +coord_flip() +scale_fill_manual(values =c("FALSE"="lightgray", "TRUE"="#e95420"), guide ="none") +labs(title ="Percentage of Dead Trees Across Districts",subtitle ="District 32 is highlighted as the target",x ="Council District",y ="Percentage of Dead Trees" ) +ylim(0, 15) +theme_bw()
Clearly, District 32 has the most amount of dead trees within the area compared to the other selected districts. Reasons could include better maintenance by the community and an area that can have trees thrive more given they are no longer surrounded by water. District 30 stands out as being very close to the dead tree percentage of district 32, but consider district 30 has far greater tree density and count compared to district 32. In short, this is comparing a tree density of 84.4 trees per square kilometer in District 32 to 136.4 trees per square kilometer in District 30. District 32 likely has fewer trees with Rockaway Beach and does not get the attention it deserves to fix the dead tree issue.
Conclusion
If you are still not convinced, take a look of these districts with dead trees only:
Show the code
#Add the 3 districtsboundary_32 <- boundaries |>filter(CounDist.x %in%c(32, 30, 42, 46))dist32 <- all_data |>filter(CounDist %in%c(32, 30, 42, 46), Condition %in%"Dead")#Create the visualizationggplot() +geom_sf(data = boundary_32, mapping =aes(geometry = geometry)) +geom_sf(data = dist32, mapping =aes(geometry = geometry), size =0.7, color ="#e1cb7e", alpha =0.5) +labs(title ="City Council District 32") +theme_bw() +theme(aspect.ratio =0.90)
Council District 32 should now stand out as having the highest dead tree percentage. Districts like 42 and 46, whom are close to water, also have many dead trees but not in such a dense fashion like District 32. Keep in mind that District 32 is very close to the JFK airport, not having enough trees in excellent condition could negatively impact residents as less CO2 gas is absorbed by the trees. We can start small by targeting the most critical neighborhoods and move our way to replace the dead trees and stumps with new trees in the future.